Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
本文提出了概率共形预测(PCP),这是一种预测推理算法,该算法通过不连续的预测集估算目标变量。给定输入,PCP基于估计生成模型的随机样品构建预测集。它有效且与显式或隐式有条件生成模型兼容。从理论上讲,我们表明PCP可以保证使用有限样品正确的边际覆盖范围。从经验上讲,我们研究了PCP在各种模拟和真实数据集上。与现有的共形推断方法相比,PCP提供了更清晰的预测集。
translated by 谷歌翻译
变异推理通常从近似分布q到后p中最小化“反向” kullbeck-leibeler(kl)kl(q || p)。最近的工作研究“正向” KL KL(P || Q),它与反向KL不同并不能导致低估不确定性的变异近似值。本文介绍了运输评分攀登(TSC),该方法通过使用汉密尔顿蒙特卡洛(HMC)和新型的自适应传输图来优化KL(P || Q)。传输图通过充当潜在变量空间和扭曲空间之间变量的变化来改善HMC的轨迹。TSC使用HMC样品在优化KL时动态训练传输图(P || Q)。TSC利用协同作用,在该协同作用下,更好的运输地图会导致更好的HMC采样,从而导致更好的传输地图。我们在合成和真实数据上演示了TSC。我们发现,在训练大规模数据的变异自动编码器时,TSC可以实现竞争性能。
translated by 谷歌翻译
我们将神经激活编码(NAC)作为一种学习从未标记数据的深度表示的新方法,用于下游应用。我们认为深度编码器应在下游预测器的数据上最大化其非线性表征,以充分利用其代表性。为此,NAC通过嘈杂的通信信道通过嘈杂的通信信道最大化编码器的激活模式和数据之间的相互信息。我们表明,用于稳健激活码的学习增加了Relu编码器的不同线性区域的数量,因此是最大的非线性表达性。 NAC更有意义地了解数据的连续和离散表示,我们分别在两个下游任务中评估:(i)Cifar-10和Imagenet-1k和(ii)在CiFar-10和Flickr-25k上的最近邻检索的线性分类。经验结果表明,NAC在最近的基本链上获得了更好或相当的性能,包括SIMCLR和Distillhash。此外,NAC预押出了对深度生成模型的培训提供了显着的好处。我们的代码可在https://github.com/yookoon/nac提供。
translated by 谷歌翻译
序列模型是现代NLP系统的关键组成部分,但它们的预测难以解释。我们考虑虽然可以解释单个模型预测的基础,但是可以解释各种模型预测的上下文的模型解释。通过解决组合优化来找到顺序律师:最佳理由是输入令牌的最小子集,这些令牌将预测与完整序列相同的输出。枚举所有子集是棘手的,因此我们提出了一种高效的贪婪算法来近似这个目标。称为贪婪合理化的算法适用于任何模型。对于这种方法有效,模型应该在对上下文的不完整子集进行预测时形成兼容的条件分布。这种情况可以用短的微调步骤强制执行。我们研究语言建模与机器翻译的贪婪合理化。与现有的基线相比,贪婪合理化是最优化组合目标的,并提供最忠实的理由。在注释的顺序理由的新数据集中,贪婪的理由与人类理由最相似。
translated by 谷歌翻译
贝叶斯建模可帮助应用研究人员阐明其数据的假设,并开发针对特定应用程序量身定制的模型。得益于近似后验推断的良好方法,研究人员现在可以轻松地构建,使用和修改复杂的贝叶斯模型,以获取大型和丰富的数据。但是,这些能力将重点放在模型批评的问题上。研究人员需要工具来诊断其模型的适应性,了解他们的位置不足并指导其修订。在本文中,我们为贝叶斯模型批评开发了一种新方法,即人群预测检查(POP-PC)。 POP-PC建立在后验预测检查(PPC)上,这是一种开创性方法,该方法通过评估观察到的数据上的后验预测分布来检查模型。但是,PPC使用两次数据 - 既可以计算后验预测性并评估它),这可能会导致对模型质量的过度自信评估。相比之下,POP-PC将后验预测分布与人口分布(持有数据集)的抽签进行了比较。这种方法将贝叶斯建模与频繁评估混合在一起。与PPC不同,我们证明了POP-PC已正确校准。从经验上讲,我们研究了经典回归和文本数据层次模型的POP-PC。
translated by 谷歌翻译
One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
translated by 谷歌翻译
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant model-specific analysis, and these efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult model-based derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data.
translated by 谷歌翻译
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译